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Abstract 

Tidal level is an indispensable record that is required for the 
safe navigation of ships in harbours; planning and safe 
execution of various coastal engineering projects. Harmonic 
analysis, the most widely used method for prediction of 
tides, involves the computation of harmonic constants which 
require excessive data. Numerical models require large 
amount of data in the form of bathymetry and boundary 
conditions. Artificial Neural Network (ANN) has been 
widely applied in coastal engineering field, since last two 
decades in variety of problems related to time series 
forecasting of waves and tides, prediction of sea-bed 
liquefaction and scour depth, and estimation of design 
parameters of coastal engineering structures. Its ability to 
learn highly complex interrelationship based on provided 
data sets with the help of a learning algorithm along with 
built in error tolerance and less amount of data requirement, 
making it a powerful modelling tool in the research 
community. Study has been carried out to predict tide levels 
using Feed-Forward Back Propagation (FFBP) network with 
Levenberg-Marquardt (LM) algorithm. Field data of Karwar 
tide gauge station has been used to train and test the 
network performance. Effect of network architecture on the 
performance of model also has been studied. Results are 
compared with those of predictions carried out using Auto 
Regressive Integrated Moving Average (ARIMA) technique. 
ANN provides better prediction when compared to ARIMA. 
It can be concluded that ANN can be used to predict tides at 
Karwar successfully using short term hourly tide level data. 
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Introduction 

Tidal level plays a major role in various activities like 
planning of harbour, determination of Mean Sea Level 
(MSL) and navigation depth, drawing marine 
boundaries, storm surge monitoring and even in 
disposal of sediments. Monitoring and prediction of 
tidal level, is thus important for the smooth planning 
and execution of its related activities. Tidal ranges are 
largely affected by the gravitational pull of Sun and 


Moon on the oceanic water body, the component of 
tide called The astronomical tide'. Other factors like 
bottom topography, sea-level pressure, and wind 
speed also contribute to the tidal range called 'non- 
astronomical tide'. Traditional method of prediction of 
tides is done by Harmonic method, which accounts for 
the parameters or constituents of astronomical tide. It 
is given by the eqn. 

H= Ho+ Acos(at+a) + Bcos(bt+(3) + Ccos(ct+y) +. . . (1) 

where, H is the height of the tide at location. Ho is the 
MSL, A, B, C are the amplitudes of the constituents 
and (at+a) are the phases of the constituents. Once the 
harmonic constituents or constants are found out for a 
location by means of least mean squares (Doodson, 
1928) or Kalman Filtering as used by Yen et al (1996), it 
can be used to predict the tides by reuniting them with 
the available astronomical relations prevailing at the 
time for which predictions have to be done. For a 
detailed account of Harmonic analysis and prediction 
of tides one can refer to (Schureman, 1971). The major 
drawback of this method is the large amount of 
continuous tide data that is required to determine the 
tidal constituents (Reid, 1990). As well the method 
does not take in to consideration the various 
hydrodynamic and meteorological parameters. Though 
Kalman filtering requires fewer amounts of data, its 
prediction is for short-term duration. Numerical 
models like finite difference method require accurate 
boundary conditions and geometric information (Chen 
et al, 2007). Although including more number of 
constituents in the harmonic analysis improves the 
accuracy it leads to problem of growing memory and 
calculation time. In addition, harmonic analysis and 
Kalman filtering methods are ineffective in 
supplementing the lost tide data, especially when tidal 
level changes are complex in nature and data available 
is incomplete (Liang et al, 2008). Further harmonic 
analysis is restricted to the prediction of the tides at a 
particular station only as tidal constituents vary from 
one place to another. 

Artificial Neural Networks have been applied in the 
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field of coastal engineering to overcome the major 
drawback of excessive data requirement of existing 
methods, ever since Mase et al (1995) used it for 
stability analysis of rubble mound breakwaters. Later 
Tsai and Lee (1999) used Back Propagation Network 
(BPN) along with gradient descent method to predict 
tides at Taichung harbour and Mirtuor coast. It was 
revealed from the study that long duration predictions 
can be done using very small duration data set. The 
requirement of determining harmonic constant was 
overlooked in this method as prediction was made 
based on the models trained with past tidal records. In 
a similar study conducted to check the applicability of 
ANN where different tide conditions exist (diurnal, 
semi-diurnal and, mixed) by Lee and Jeng (2002) at 
three different stations, satisfactory results were 
obtained at all the stations. Lee et al (2002) and Lee 
(2004) showed that ANN can be used to supplement 
missing data. The study also made use of BPN with no 
hidden layers to finalise the major tidal constituents of 
the location out of 69 tidal constituents. Study showed 
that two months of tidal records were required to get 
clear results, whereas, by using one months' data, one 
can get an idea on the type of tide existing at a location. 
Yearly tide level data is required for the same purpose 
if constituents were to be found out by means of 
traditional harmonic analysis method. In a comparative 
study conducted on hydrodynamic and ANN models at 
two different stations, the hydrodynamic model 
outperformed ANN in terms of CC value 
(Vivekanandan and Singh, 2002). Though the results 
were marginally (order of 0.020) in favour of 
hydrodynamic models, hydrodynamic models 
required initial boundary conditions as input. The 
study concluded that ANN can be used as a substitute 
for hydrodynamic models considering the sparse data 
requirement and less computational time taken. 
Regional neural network water level (RNN-WL) 
prediction model developed to predict sea water level 
at a station using data obtained from other stations in 
the region, provides a cost effective way to obtain 
long-term tidal data for regional stations where the 
established tide observation gauges are expensive and 
instead can be relocated to other new sites (Huang et 
al., 2003). 

In a study (Rajasekaran et al., 2006) in which 
functional neural networks (FNN) and sequential 
learning neural networks (SLNN) were applied to 
predict tide levels, the FNN was found to be more 
accurate, which uses domain knowledge rather than 
data knowledge used by conventional ANN, that is to 
say, FNN learn functions as opposed to learning of 


weights by ANN. On the other hand, SLNN involves 
large number of iterations of the order 140,000 but 
takes less computational time mainly because of the 
presence of a single neuron in the hidden layer. Seven 
parameters affecting the tide generating forces from 
tide theory were used as input to create Tide 
Generating Force-Neural Network(TGF-NN) by 
Chang and Lin (2006). The model was trained using 
one year tidal data and same data was used to find 
harmonic constants in harmonic method consisting of 
60 constituents HM (60). The results showed that TGF- 
NN is as powerful as HM (60) when one year tidal 
data is used and with 2 hour lead time. The model was 
compared with other tide prediction models like 
NA0.99b, HM (26) harmonic analysis with 26 
constituents and Response-Orthotide (R-O) method 
and the results showed TGF-NN outperforming all the 
other models. 

Chen et al., (2007) combined ANN and wavelet 
analyses to extend the predictions for 5 year duration 
and to improve the prediction quality, and formed 
various models for locations in and around Taiwan 
and South China Sea. Makarynska and Makarynskyy 
(2008) used feed forward neural network with 
Resilient Back Propagation (RBP) learning algorithm 
to predict tide levels. The RBP learning algorithm 
provided quicker computation. They also modelled a 
network to fill in missing data using 12h prior and 
after data as inputs and residual value between 
interpolated and observed tide levels as the output 
targets. The results obtained were satisfactory with CC 
value between 0.93-0.96 using various lengths of tide 
data as input. In a recent study conducted by Filippo 
et al.,(2012) at two stations in Brazil, the effect of 
meteorological parameters of wind speed and sea level 
atmospheric pressure on tidal predictions were 
studied by incorporating 3 hour wind speed data and 
atmospheric pressure data in the input along with 
calculated tide data from harmonic analysis method. 
One year tide data was used for training. The 
importance of the meteorological parameters were 
highlighted by the results obtained which showed a 
considerable decrease in error from 26% to 12% and 
from 31% to 2% in case of station 1 and station 2 
respectively where studies were undertaken. 

Method 

Artificial Neural Networks (ANN) 

Development of ANN can be attributed to the attempt 
carried out to mimic the working pattern of human 
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brain. Its success lies in its ability to exploit the non- 
linear relationship between input and output data by 
continuously adapting itself to the information 
provided to it, by means of some learning process. 
ANN can be classified based on network type into 
feed forward and feedback or recurrent networks. The 
basic difference between the two is that, in feed 
forward networks, the information is passed from one 
layer to the other in a forward manner till the output is 
obtained in the output layer. Whereas in, feedback 
network, the output obtained in the output layer is fed 
back in to the network through input layer thus this 
type of network will have a minimum of single loop in 
its structure. Further, ANN can also be classified based 
on learning type i.e. supervised and unsupervised 
learning. In supervised learning, a set of data input 
and corresponding output is fed into the network and 
the calculated output is compared with target output 
(given output values to the network), and the 
difference between the two is the error and through 
various error correction measures available, the 
network adapts itself till the error reaches a minimum 
value or fixed number of iterations is complete. In 
unsupervised learning, the networks are tuned to 
statistical regularities of the input data by learning 
rules like radial basis function and others, here no 
input-output data set is presented to the network. 


bias ‘bt’ 



FIG. 1 BASIC MODEL OF ANN 


Fig. 1 shows the basic mathematical model of ANN 

where, xi,X2, X3 x n are the input parameters; Wki, Wk 2 , 

Wk3 Wki are the weights associated with the 

connections i.e. synaptic weight connections from 
input neuron Y to neuron 'k' and i = 1 to n. 'k' neuron 
is the summing junction where net input is given by, 

Uk= Y J wki*xi (2) 

i=l 

and. 


vk=Uk+bk (3) 

where bk is the bias value at the k th node. The final 
output yk is the transformed weighted sum of vk or in 
other words yk is the function of vk represented by. 


yk = O(vk) 


(4) 


where O- is the transfer function used to convert the 
summed input. A non-linear sigmoid function, which 
is monotonically increasing and continuously 
differentiable, is the commonly adopted transfer 
function. It is mathematically expressed as, 

yk= 0(vk)=l/[l + exp(-m^)] (5) 

Others such as hardlim, logsig, tansig and prelim can 
also be used to get the desired result. 

The most commonly used learning algorithm in 
coastal engineering application is the gradient descent 
algorithm, in which the global error calculated is 
propagated backward to the input layer through 
weight connections, during which the weights are 
updated in the direction of steepest descent or in the 
direction opposite to gradient descent. However, the 
overall objective of any learning algorithm is to reduce 
the global error, E defined as 



(6) 

N 

(7) 

Y (ok-tk) 2 


where E P is the error at the p th training pattern. Ok is 
the obtained output from network at the k th output 
node and tk is the target output k th output node and N 
is the total number of output nodes. Levenberg- 
Marquardt algorithm (Levenberg, 1944; Marquardt, 
1963) used in this study can be written as 

Wnew= Wold- [J T J + yl]- 1 J T E (Wold) (8) 

where J is the Jacobien of the error function (E), I is the 
identity matrix and y is the parameter used to define 
the iteration step value (Panizzo and Briganti, 2007; 
Gunaydin, 2008). It minimizes the error function 
while trying to keep the step between old weight 
configuration (Woid) and new updated one (Wnew) 
small. 

The performance of the network is measured in terms 
of various performance functions like sum squared 
error (SSE), mean squared error (MSE), root mean 
squared error (RMSE) and Co-efficient of Correlation 
(CC or V) between the predicted and the observed 
values of the quantities. Lower value of RMSE and 
higher value of CC indicate better performance of the 
network. 

The major drawback of the Feed Forward Back 
Propagation (FFBP) is that of the network getting 
trapped in the local minima. The over learning 
phenomena due to high learning late may lead to 
oscillatory behaviour of the network. Very large 
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number of neurons in the hidden layer will lead to 
complex learning and might take large number of 
iterations to terminate the process. Less number of 
input data makes it difficult for the network to learn 
all the relationship involved between the input and 
target parameters. Too many variations in the 
involved data set also will diminish the accuracy of the 
network. The mentioned setbacks can, however, be 
overcome by selecting the optimum architecture of the 
network using various techniques like sensitivity 
analysis to select most effective input parameters and 
reduce network size to decrease the computational 
time required. Using generalization techniques to 
improve the quality of the input data like Principal 
Component Analysis (PCA) will also help in 
improving prediction quality. Other ANN models 
based on conjugate gradient algorithm, radial basis 
function, cascade correlation algorithm and recurrent 
neural networks can be used to overcome this (ASCE 
Task Committee, 2000). Recently, many studies have 
been carried out combining ANN with statistical and 
other Artificial Intelligence (AI) methods of Genetic 
Programming (GP) and Fuzzy Logic (FL) systems to 
improve the forecasting accuracy and duration as well. 

The non-linear data driven self-adaptive approach, 
opposed to the high data requirement of the numerical 
models along with requirements of initial boundary 
and geometry of the study area in case of ocean 
engineering application, makes ANN attractive and a 
powerful tool for modelling when underlying data 
relationship is unknown. Many studies have shown 
that once the network is validated for a particular task 
they can be successfully applied for practical on field 
applications as well (Londhe and Panchang, 2006). The 
detailed account of theory and mathematics basics 
behind ANN can be found in literature of (Haykin, 
2006). 

Auto Regressive Integrated Moving Average (ARIMA) 

The general Auto Regressive Moving Average (ARMA) 
model introduced by Box and Jenkins (1976) includes 
autoregressive as well as moving average parameters, 
and explicitly includes differencing in the formulation 
of the model. Specifically, the three types of 
parameters in the model are the autoregressive 
parameters (p), the number of differencing passes (d), 
and moving average parameters (q). The parameters 
are estimated so that the sum of squared residuals is 
minimized. The estimates of the parameters are used 
in the forecasting to calculate new values of the series 
and confidence intervals for those predicted values. 


The estimation process is performed on transformed 
(differenced) data; hence, before the forecasts are 
generated, the series be integrated so that the forecasts 
are expressed in values compatible with the input data. 
This automatic integration feature is represented by 
the letter I in the name of the methodology (ARIMA = 
Auto-Regressive Integrated Moving Average). 

The ARMA model, proposed by Box and Jenkins (1970) 
with the idea of linear filter to estimate the stochastic 
data is defined by the equation: 

x y = cj)ixt-i +cj) 2 Xt- 2 . .+cj)pXt-p+ £,t - cpi£,t-i _ cp2£,t-2.. _ cp q £,t-q(9) 

in which p and q are separately the order of the 
autoregressive and moving average model, xt and xt-i 
are the observation at the time instant t and t- i, £t and 
U-i are the residual values and, cf)i to cf) P and £,i to are 
the finite set of weighted parameters. For the ARMA 
model, larger amounts of the measured data will 
provide a better prediction. Furthermore, longer 
computational time for the parameter identification is 
required. Also, the parameters involved in ARMA will 
be affected by the changes of some environmental or 
sociological variables. 


Materials 



FIG. 2 LOCATION OF STUDY AREA 
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FIG. 3 THE ANN STRUCTURE FOR ONE MONTH PREDICTION 
USING ONE MONTH DATA, HAVING SINGLE INPUT AND 
OUTPUT LAYER NODES. 


Three years of hourly gauge measured tide level data 
from December 2008 to December 2011 of Karwar 
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station (14°48.183'N 74°06.865'E), obtained from 
National Institute of Oceanography, Goa was utilized 
in the study. Predictions were carried out for varying 
length of duration using a week's data, month's data 
and one year' data as input. Three layered FFBP 
network with an input layer, hidden layer and output 
layer were used with Levenberg-Marquardt (LM) 
algorithm for the purpose of prediction of tide. 
Tangent-Sigmoid (tansig) and linear (purelin) transfer 
function were used in hidden and output layer 
respectively. The data was normalised to fall in range 
of -1 to 1 to speed up the learning process. One 
thousand epochs were set as the stopping criteria of 
training process of network for all the predictions 
undertaken. 

Results and Discussion 

The four weeks' prediction was carried out using one 
weeks' hourly data from 1/1/2009 to 7/1/2009 as input. 
The target data set comprised readings of four weeks 
duration from 8/1/2009 to 4/2/2009 (28 days), a week's 
data having 168 time steps represent a single node in 
this case, similarly output layer consists of four nodes 
for four weeks of data. Subsequent weeks in similar 
fashion were given as input and target for testing the 
trained network. The 'r' values showed marginal 
increase in 4 weeks' prediction which might be due to 
the increased number of target values available for the 
network generalisation. However, the 'r' value 
decreased for the 12 weeks' tide level prediction as one 
week' input data's range was too narrow to predict a 
long duration of 12 week's tide level. The number of 
neurons was increased in hidden layer by one after 
every prediction. The best performance was obtained 
at six and two neurons in hidden layer during 4 weeks 
and 12 weeks prediction of tide levels. The training 
performance showed considerable increase in 'r' 
values but testing 'r' values drastically reduced 
hinting at the overfitting behaviour of the network 
when the number of neurons was increased beyond 
six and two during 4 weeks and 12 weeks tide level 
prediction when the number of neurons in hidden 
layer was increased. This phenomenon refers to a state 
where there is large number of neurons in hidden 
layer increasing the complexity of the network, but 
there is no significant amount of patterns to be learnt 
by the network based on given input- target datasets. 
As well in both the cases the prediction duration is 
large (4 weeks and 12 weeks) compared to input data 
of one week. Naturally, the range of targets will be 
greater than that of input provided, weakening the 
prediction capability of the network when new data 


set is fed to the network. 


TABLE 1 MEAN SQUARE ERROR ('MSE') AND COEFFICIENT OF 
CORRELATION ('R') VALUES FOR TIDE LEVEL PREDICTIONS OF 4 WEEKS 
AND 12 WEEKS. 


Network 

structure 

Tnse' 

V 

Training 

Testing 

Training 

Testing 

1-6-4 

1589.7 

1656.1 

0.625 

0.579 

1-2-12 

1943.4 

2461.5 

0.463 

0.299 


Monthly prediction involved feeding the network with 
a months' data with 720 data points in a single input 
node. The year 2009 hourly tide levels data were 
divided in to twelve sets with 720 data points, each 
corresponding to 30 days of observation, hence the 
yearly data comprised 1/1/2009 to 26/12/2009(360 days). 
For one months' tide level prediction data from 
1/1/2009 to 30/1/2009 and from 31/1/2009 to 1/3/2009 
was given as input and target data respectively. The 
subsequent data of 30 days period were given as input 
and targets for testing purpose. The number of 
neurons was increased in the hidden layers, however, 
not much appreciable improvement was seen hence 
the number of neurons was taken as one for all the 
forthcoming predictions which has a month's data as 
input in a single node which involved one month data 
as input. Table 2 gives the results for monthly 
prediction duration of one month, two months and 
three months using one month input data. 


TABLE 2 MEAN SQUARE ERROR ( / MSE / ) AND COEFFICIENT OF 
CORRELATION ('R') VALUES FOR TIDE LEVEL PREDICTIONS OF ONE 
MONTH, TWO MONTHS AND THREE MONTHS. 


Network 

structure 

Tnse' 

V 

Training 

Testing 

Training 

Testing 

1 - 1-1 

331.77 

529.50 

0.928 

0.884 

1-1-2 

735.54 

917.36 

0.830 

0.802 

1-1-3 

1121.20 

1142.0 

0.728 

0.716 


Monthly prediction was carried out using first month 
of year 2009 as input and corresponding month of year 
2010 as target, later first month of year 2010 input was 
taken as input and year 201 l's corresponding month 
was given as target for training and testing purpose 
respectively. The results obtained were less 
satisfactory with 'r' value of 0.316 and 0.304 during 
training and testing of the network. This might be due 
to the time gap of 11 months between the input and 
target value. 

Preliminary analysis done by taking monthly 
averaged values of the three years hourly data of tide 
levels showed that, from January till August, the 
average monthly tide level decreased and from 
September till January the levels showed rising trend 
during all the three years (Fig.4). Hence monthly 
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prediction was carried out based on this analysis 
dividing the data into two sets, one set consisting of 
tide levels of months from January till August and the 
other consisting of months from September till 
December of all three years. Predictions were carried 
out separately for each data set, and the data of year 
2009 and 2010 were given as input and target for the 
training purpose and the tide levels of 2010 and 2011 
was given as input and target for testing purpose of 
the network. Good results were obtained in case of 
data set one which comprised months from January to 
August and satisfactory results were obtained when 
September to December month's data was used for 
prediction. This decrease in prediction accuracy might 
be due to the fact that, in data set comprising the tide 
levels from January till August, there is a time gap of 
four months between input and target values whereas 
in the second case there is a time gap of eight months; 
as well the number of data points will be half as that is 
available in the first case for network generalisation. 
The optimum numbers of neurons were found to be 
ten for the first data set and five for the second which 
might be due to the greater number of input and 
output nodes (eight) in first case and just four in the 
second case. The results are tabulated in table3. 



FIG. 4 PLOT OF MONTHLY AVERAGED TIDE LEVEL OF THE 
YEAR 2009, 2010 AND 2011. 

TABLE 3 'MSE' AND 'R' VALUES FOR TIDE LEVEL PREDICTIONS OF 8 
MONTH AND 4 MONTH DURATION USING INPUT AND OUTPUT DURATION 
OF SAME LENGTH AND MONTHS OF CONSECUTIVE YEARS. 


Network 

structure 

'mse' 

V 

Training 

Testing 

Training 

Testing 

8-10-8 

613.50 

1527.7 

0.865 

0.690 

4-5-4 

1587.0 

1828.9 

0.586 

0.538 


A year's hourly data set contains 8760 readings; 
however, this was cut short to 8640 and 8736 readings 
so that the data can be divided equally into 12 months 
composed of 30 days (720 readings) each and 52 weeks 
composed 7 days each (168 readings). Hence the first 
month will comprise data from 1/1/2011-30/1/2011, 
second month from 31/1/2011-1/3/2011 and so on till 


26/12/2011. 



_ 50 i 1 1 • 1 

0 1000 2000 3000 4000 5000 6000 7 000 8000 9000 

Tune in hours 


FIG. 5 GRAPH SHOWING THE PREDICTED AND OBSERVED 
HOURLY TIDE LEVEL VALUES FOR THE YEAR 2011 USING 
MONTHLY DATA SETS (1/1/2011-26/12/2011). 
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FIG. 6 GRAPH SHOWING THE PREDICTED AND OBSERVED 
HOURLY TIDE LEVEL VALUES FOR THE YEAR 2011 USING 
WEEKLY DATA SETS (1/1/2011-30/12/2011). 

Prediction for entire year using 12 months data of year 
2009 as input with equal number of input nodes was 
carried out giving the year 2010's tide levels of 12 
months as target during the training process. The 
trained network was then used to predict the 201 l's 
tide level using 2010 tide levels as the input. The 
results obtained were very good in this case and 'mse' 
value as low as 192.98 and 536.74 was obtained for 
training and testing respectively, also high 'r' value of 
0.959 and 0.891 were obtained. The optimum number 
of neuron in hidden layer was found to be 10 in this 
case. The plot of observed values and predicted values 
is shown in the Fig.5. Figure 7 and Figure 8 give the 
scatter plot of training and testing of the network. 
Similarly year long predictions done using 52 weeks 
data of year 2009 as input and data of year 2010 as 
target in training process and next two consecutive 
years data as input and target for testing purpose, 
respectively, yielded very good results with training 
process 'r' value reaching up to 0.99 and testing 'r' 
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value reaching a value of 0.897. Once again the 
optimum number of neuron was found to be ten in 
this case as well. The 'mse' values were 14.13 and 
478.47 for training and testing purpose respectively. 



FIG. 7 SCATTER PLOT OF TRAINING OF NETWORK DURING 
YEARLY PREDICTIONS DONE USING MONTHLY DATA SETS 



0 50 100 150 200 
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FIG. 8 SCATTER PLOT OF TESTING OF NETWORK DURING 
YEARLY PREDICTIONS DONE USING MONTHLY DATA SETS 



FIG. 9 SCATTER PLOT OF TRAINING OF NETWORK DURING 
YEARLY PREDICTIONS DONE USING WEEKLY DATA SETS 



Target (T) 

FIG. 10 SCATTER PLOT OF TESTING OF NETWORK DURING 
YEARLY PREDICTIONS DONE USING WEEKLY DATA SETS 

TABLE 4 VARIATION OF 'MSE' AND 'R' VALUES WITH PHASE LAGS FOR 
TIDE LEVEL PREDICTIONS OF THE YEAR 2011 USING THE ARIMA MODEL. 


Phase lag 

'mse' 

V 

0 

2290.87 

0.237 

1 

2418.77 

0.059 

2 

2206.09 

0.303 

3 

1963.73 

0.437 

4 

1961.43 

0.438 

5 

2188.46 

0.314 

6 

2401.20 

0.106 

7 

2390.43 

0.125 

8 

2193.99 

0.311 

9 

2048.65 

0.395 

10 

2130.75 

0.350 

11 

2349.24 

0.381 

12 

2146.80 

0.071 

13 

2146.38 

0.341 

14 

1672.81 

0.558 

15 

1367.30 

0.661 

16 

1507.30 

0.615 

17 

1993.18 

0.422 

18 

2391.70 

0.119 

19 

2301.38 

0.231 

20 

1696.62 

0.548 

21 

1012.00 

0.763 

22 

762.20 

0.828 

23 

1133.60 

0.730 

24 

1831.58 

0.495 


The results obtained from the ANN were compared 
with those of ARIMA model. In the ARIMA modeling, 
the data given for simulation purpose of ANN was 
used. Tide levels of the year 2010 were given as 
predictor series and 2011 as the series that had to be 
predicted. Mean squared error 'mse' and coefficient of 
correlation V were taken as performance indicators to 
compare the results with those of ANN's. Predictions 
were carried out for time lags varying from one hour 
till 24 hours. The results of the same are presented in 
table 4. Best result was obtained for prediction done 
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with time lag of 22 hours with mse value of 762.201 
and 'r' value of 0.828. The values were considerably 
less than those obtained by the ANN model (mse = 
478.47 and r = 0.897) when predictions were carried 
out using yearlong weekly data sets. 

Conclusions 

ANN has been widely used in the field of coastal 
engineering for various applications. In this particular 
study, ANN has been used to predict hourly tide 
levels at Karwar station located on the west coast of 
India and the results have been compared with those 
of the standard ARIMA model. From the study carried 
out, the following conclusions can be drawn: 

1. The results of the study were in good 
agreement with the previous studies that the 
larger the data set used for model training, 
better was the network created. 

2. Too many neurons in the hidden layers will 
lead to overfitting of the network, causing poor 
predictions when new testing data sets are fed 
in to the network. Therefore, it is advisable to 
check the optimum number of neurons when 
new data sets with different number of input 
nodes and output nodes are fed. 

3. Prediction of one month, two months and three 
months giving one month tide level as input 
has yielded good results with 'r' values greater 
than 0.9, 0.8 and 0.7 respectively. However, the 
result of monthly predictions by giving a 
months' data of one year and same month's 
tide level of next year as target value, yielded 
poor results as this introduced a time gap of 11 
months between input and target data. The fact 
was evident when 'r' value increased for 
predictions carried out for four months from 
September till December giving similar 
month's tide level data of previous year as 
input in the training of network, as time gap 
reduced to 8 months in this case and a further 
increase in 'r' value for predictions from 
January till August which reduced the time 
gap to four months. 

4. Satisfactory results were obtained for a 
complete one year's prediction based on 
previous year's data which gave an 'r' value 
greater than 0.95 and 0.89 in both the cases 
when data was divided into monthly and 
weekly data sets for training and testing of the 
network respectively. 

5. ANN outperformed the ARIMA model with 22 


hour phase lag in terms of 'mse' and 'r' values 
justifying the application of the same for the 
purpose even without the phase lag correction. 

The study showed that the ANN can be used for the 
prediction of tides successfully. However, for 
prediction with less data and data with large time 
gaps between them, studies can be undertaken using 
various advanced networks like dynamic network. 
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